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^ Abstract 

' We investigate the problem of decoding a bar code from a signal mea- 

sured with a hand-held laser-based scanner. Rather than fornnilating the 
inverse problem as one of binary image reconstruction, wc instead incor- 
• porato the symbology of the bar code into the reconstruction algorithm 

directly, and search for a sparse representation of the UPC bar code with 
respect to this known dictionary. Our approach significantly reduces the 
pH degrees of freedom in the problem, allowing for accurate reconstruction 

' ' that is robust to noise and unknown parameters in the scanning device. 

We propose a greedy reconstruction algorithm and provide robust recon- 
struction guarantees. Numerical examples illustrate the insensitivity of 
1^ our symbology-based reconstruction to both imprecise model parameters 

and noise on the scanned measurements. 

o 

1 Introduction 

O 

'J This work concerns an approach for decoding bar code signals. While it is true 

I that bar code scanning is essentially a solved problem in many domains, as 

• • evidenced by its prevalent use, there is still a need for more rehable decoding 

. ^ algorithms in situations where the signals arc highly corrupted and the scan- 

ning takes place in less than ideal situations. It is under these conditions that 
^ traditional bar code scanning algorithms often fail. 

The problem of bar code decoding may be viewed as the deconvolution of 
a binary one-dimensional image involving unknown parameters in the blurring 
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kernel that must be estimated from the signal [5]. Esedoglu [5] was the first 
to provide a mathematical analysis of the bar code decoding problem in this 
context, and he established the first uniqueness result of its kind for this prob- 
lem. He further showed that the blind deconvolution problem can be formulated 
as a well-posed variational problem. An approximation, based on the Modica- 
Mortola energy [TT] , is the basis for the computational approach. The approach 
has recently been given further analytical treatment in [3- 

A recent work [2] addresses the case where the blurring is not very severe. 
Indeed the authors were able to treat the signal as if it has not been blurred. 
They showed rigorously the variational framework can recover the true bar code 
image even if this parameter is not known. A later paper [3] consider the case 
where blurring is large and its parameter value known. However, none of these 
papers deal rigorously with noise although their numerical simulations included 
noise. For an analysis of the deblurring problem where the blur is large and 
noise is present, the reader is referred to [7j. 

The approach presented in this work departs from the above image-based 
approaches. We treat the unknown as a finite-dimensional code and develop a 
model that relates the code to the measured signal. We show that by exploiting 
the symbology - the language of the bar code - a bar code can be identified 
with a sparse representation in the symbology dictionary. We develop a recovery 
algorithm that fits the observed signal to a code from the symbology in a greedy 
fashion, iterating in one pass from left to right. We prove that the algorithm 
can tolerate a significant level of blur and noise. We also verify insensitivity of 
the reconstruction to imprecise parameter estimation of the blurring function. 

We were unable to find any previous symbol-based methods for bar code 
decoding in the open literature. In a related approach a genetic algorithm 
is utilized to represent populations of candidate barcodes together with likely 
blurring and illumination parameters from the observed image data. Successive 
generations of candidate solutions are then spawned from those best matching 
the input data until a stopping criterion is met. That work differs from the 
current article in that it uses a different decoding method and does not utilize the 
relationship between the structure of the barcode symbology and the blurring 
kernel. 

We note that there is a symbol-based approach for super-resolving scanned 
images [1]. However, that work is statistical in nature whereas the method we 
present is deterministic. Both this work and the super-resolution work are simi- 
lar in spirit to lossless data compression algorithms known as 'dictionary coding' 
(see, e.g., |T2J) which involve matching strings of text to strings contained in an 
encoding dictionary. 

The outline of the paper is as follows. We start by developing a model 
for the scanning process. In Section 3, we study the properties of the UPC 
(Universal Product Code) bar code and provide a mathematical representation 
for the code. Section 4 develops the relation between the code and the measured 
signal. An algorithm for decoding bar code signals is presented in Section 5. 
Section 6 is devoted to the analysis of the algorithm proposed. Results from 
numerical experiments are presented in Section 7, and a final section concludes 
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the work with a discussion. 



2 A scanning model and associated inverse prob- 
lem 

A bar code is scanned by shining a narrow laser beam across the black-and- 
white bars at constant speed. The amount of light reflected as the beam moves 
is recorded and can be viewed as a signal in time. Since the bar code consists of 
black and white segments, the reflected energy is large when the beam is on the 
white part, and small when the beam is on the black part. The reflected light 
energy at a given position is proportional to the integral of the product of the 
beam intensity, which can be modeled as a Gaussiar[^ and the bar code image 
intensity (white is high intensity, black is low). The recorded data are samples 
of the resulting continuous time signal. 
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Figure 1: Samples of the binary bar code function z(t) and the UPC bar code. 
Note that in UPC bar codes, each bar - black or white - is a multiple of the 
minimum bar width. 



^This Gaussian model has also been utilized in many previous treatments of the bar code 
decoding problem. See, e.g., |8] and references therein. 
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Let us write the Gaussian beam intensity as a function of time: 

(1) 



There are two parameters: (i) the variance and (ii) the constant multipUer 
a. We will overlook the issue of relating time to the actual position of the laser 
beam on the bar code, which is measured in distance. We can do this because 
only relative widths of the bars are important in their encoding. 

Because the bar code - denoted by z{t) - represents a black and white image, 
we will normalize it to be a binary function. Then the sampled data are 

di = J g{ti - T)z{T)dT + hi, z e [to], (2) 

where the ti g [0, n] are equally spaced discretization points, and the hi rep- 
resent the noise associated with scanning. We have used the notation [m] = 
{1, 2, to}. We need to consider the relative size of the laser beam spot to the 
width of the narrowest bar in the bar code. We set the minimum bar width to 
be 1 in the artificial time measure. 

It remains to explain the roles of the parameters in the Gaussian beam inten- 
sity. The variance tr^ models the distance from the scanner to the bar code, with 
larger variance signifying longer distance. The width of a Gaussian represents 
the length of the interval, centered around the Gaussian mean, over which the 
Gaussian is greater than half its maximum amplitude; it is given by 2^2 In 2a. 
Informally, the Gaussian blur width should be of the same order of magnitude 
as the size as the minimum bar width in the bar code for possible reconstruc- 
tion. The multiplier a lumps the conversion from light energy interacting with 
a binary bar code image to the measurement. Since the distance to the bar 
code is unknown and the intensity-to-voltage conversion depends on ambient 
light and properties of the laser/detector, these parameters are assumed to be 
unknown. 

To develop the model further, consider the characteristic function 

r 1 for 0<t<l, 
' \ else. 

Then the bar code function can be written as 



zW=^c,x(t-(j-l)), (3) 



where the coefficients Cj are either or 1 (see, e.g., Figure [ij). The sequence 

represents the information stored in the bar code, with a '0' corresponding to 
a white bar of unit width and a '1' corresponding to a black bar of unit width. 
For UPC bar codes, the total number of unit widths, n, is fixed to be 95 for a 
12-digit code (further explanations in the subsequent). 
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Remark 2.1. One can think of the sequence {ci, C2, • • • , c„} as an instruction 
for printing a bar code. Every Ci is a command to lay out a white bar if Ci = 0, 
or a black bar if otherwise. 

Substituting the bar code representation ([s]) back in ([2]), the sampled data can 
be represented as follows: 



g{U - t) 



1)) 



dt 



E 



0-1) 



g{ti - t)dt 



Cj + hi 



In terms of the matrix Q — Gif^) with entries 



Qkj - 



1 



the bar code determination problem reads 

d — aG{cr)c 



dt, k e [m], j E [n], 



(4) 



(5) 



The matrix entries Gkj are illustrated in Figure 2j2] In the sequel, we will 
assume this discrete version of the bar code problem. While it is tempting to 
solve ^ directly for c, a and a, the best approach for doing so is not obvious. 
The main difficulty stems from the fact that c is a binary vector, while the 
Gaussian parameters are continuous variables. 




10 



Figure 2: The matrix element Gkj is calculated by placing a scaled Gaussian 
over the bar code grid and integrating over each of the bar code intervals. 



3 Incorporating the UPC bar code symbology 

We now tailor the bar code reading problem to UPC bar codes, although we 
remark that our general framework should apply generally to any bar code of 



SYMBOL-BASED BAR CODE DECODING 



6 



fixed length. In the UPC-A symbology, a bar code represents a 12-digit number. 
If we ignore the check-sum requirement, then any 12-digit number is permitted, 
and the number of unit widths, n, is fixed to 95. Going from left to right, the 
UPC bar code has 5 parts - the start sequence, the codes for the first 6 digits, 
the middle sequence, the codes for the next 6 digits, and the end sequence. Thus 
the bar code has the following structure: 



(6) 



where S, Af , and E are the start, middle, and end patterns respectively, and Li 
and Ri are patterns corresponding to the digits. 

In the sequel, we represent a white bar of unit width by and a black bar 
by 1 in the bar code representation {ci}|^ The start, middle, and end patterns 
are fixed and given by 

S = E= [101], M = [01010]. 
The patterns for Li and Ri are taken from the following table: 



digit 


L-pattern 


R-pattern 





0001101 


1110010 


1 


0011001 


1100110 


2 


0010011 


1101100 


3 


0111101 


1000010 


4 


0100011 


1011100 


5 


0110001 


1001110 


6 


0101111 


1010000 


7 


0111011 


1000100 


8 


0110111 


1001000 


9 


0001011 


1110100 



(7) 



Note that the right patterns are just the left patterns with the O's and I's flipped. 
It follows that the bar code can be represented as a binary vector c G {0, 1}^^. 
However, not every binary vector constitutes a bar code - only 10^^ of the 
possible 2^^ binary sequences of length 95 - fewer than 10~^^ % - are bar codes. 
Specifically, the bar code structure ^ indicates that bar codes have specific 
sparse representations in the bar code dictionary constructed as follows: write 
the left-integer and right-integer codes as columns of a 7- by- 10 matrix, 

0000000000 
1 1 1 1 1 1 



L = 



1110 10 110 
110 10 110 1 

1 1 1 1 
10 10 1111 

1111111111 



^Note that identifying white bars with and black bars with 1 runs counter to the natural 
light intensity of the reflected laser beam. However, it is the black bars that carry information. 
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1111111111 
1 1 1 1 
1 1 1 



1 

10 

110 

1 1 1 1 
0000000000 



110 10 
110 10 1 



The start and end patterns, S and E, are 3-dimensional vectors, while the 
middle pattern M is a 5-dimensional vector 

S = E= [010]^, M = [01010]^. 



The bar code dictionary is the 95-by-123 block diagonal matrix 

S ... ... 

L : 
: L 



L 



L 



L 



M 



R 



R 



R 



R 



R 







R 
E 



The bar code ([6]), expanded in the bar code dictionary, has the form 

c = Vx, xe{0,lV^^, (8) 

where 

1. The 1st, 62nd and the 123rd entries of x, corresponding to the S, M, and 
E patterns, are 1. 

2. Among the 2nd through 11th entries of x, exactly one entry - the entry 
corresponding to the first digit in c = Vx - is nonzero. The same is true 
for 12th through 22nd entries, etc, until the 61st entry. This pattern starts 
again from tire 63rd entry through the 122th entry. In all, x has exactly 
15 nonzero entries. 
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That is, X must take the form 

x'^ = [l,vl,--- ,vl,l,vj,--- ,vl^,l], (9) 

where vj, for j ~ 1, • • ■ ,12, are vectors in {0,1}^'^ having only one nonzero 
element. In this new representation, the bar code reconstruction problem ([5| 
reads 

d^ag{a)Vx + h, (10) 

where d G M™ is the measurement vector, the matrices Gic) G jjmxss ^^^^^ 
V e {0, 1}95^123 defined in Q and ^ respectively, and ft, € M™ is 

additive noise. Note that V has fewer rows than columns, while G will generally 
have more rows than columns; we will refer to the ratio of rows to columns as 
the oversampling ratio and denote it by r = m/n. Given the data d S R™, our 
objective is to return a valid bar code x S {0, 1}^^^ as reliably and quickly as 
possible. 



4 Properties of the forward map 



Incorporating the bar code dictionary into the inverse problem ( 10 1, we see that 



the map between the bar code and observed data is represented by the matrix 
p = aQ{a)V e K'^^i^s^ ^-jj ^.g^^^ -p^ which is a function of the model 
parameters a and cr, as the forward map. 



4.1 Near block-diagonality 

For reasonable levels of blur in the Gaussian kernel, the forward map V inherits 
an almost block-diagonal structure from the bar code matrix T) as illustrated 
in Figure [Sj In the limit as the amount of blur cr — > 0, the forward map 
V becomes exactly the block-diagonal bar code matrix. More precisely, we 
partition the forward map V according to the block-diagonal structure of the 
bar code dictionary V, 



V 



p(l) p(2) _ p(15) 



(11) 



The 1st, 8th, and 15th sub-matrices are special as they correspond to the known 
start, middle, and end patterns of the bar code. In accordance with the structure 
of x where c = Px, these sub-matrices are column vectors of length m, 

pW^pW, and P(i5)^p(i5)^ 

The remaining sub-matrices are blurred versions of the left-integer and right- 
integer codes L and i?, represented as m-by-lO nonnegative real matrices. We 
write each of them as 



pO) 



pf 



(12) 
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Figure 3: A representative bar code forward map P = aQ{a)V corresponding 
to oversampling parameter r = 10, amplitude a = 1, and Gaussian standard 
deviation a = 1.5. The lone column vectors at the start, middle, and end 
account for the known start, middle, and end patterns in the bar code. 



where each p)^', k = 1,2, 10, is a column vector of length m. 

Recall that the over-sampling rate r = m/n indicates the number of time 
samples per minimal bar code width. Given r, we can partition the rows of 
P into 15 blocks, each block with index set Ij of size so that each sub- 
matrix is well-localized within a single block. We know that if P^^^ and P^^^) 
correspond to samples of the 3-bar sequence "101" or "black- white-black" , so 

= |/i5| = Sr. The sub-matrix P^^-* corresponds to samples from the middle 
5 bar-sequence so l/sl = 5r. Each remaining sub-matrix corresponds to samples 
from a digit of length 7 bars, therefore \ = 7r for j ^ 1,8, 15. 

We can now give a quantitative measure describing how 'block-diagonal' the 
forward map is. To this end, let e be the infimum of all e > satisfying both 

< e, for ah j € [15], k e [10], (13) 

1 

and 

< e, for all j G [15], and all choices of kj+i, . . . , ki5 G [10]. 

(14) 

The magnitude of s indicates to what extent the energy of each column of P 
is localized within its proper block. If there were no blur, there would be no 
overlap between blocks and e = 0. 
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Simulation results such as those in Figure |4] suggest that for a = 1, the value 
of £ in the forward map V can be expressed in terms of the oversampling ratio 
r and Gaussian standard deviation a according to the formula e — (2/5)tTr, at 
least over the relevant range of blur < a < 1.5. By linearity of the forward map 
with respect to the amplitude a, this implies that more generally e = {2/5)aar. 




Figure 4: For oversampling ratios r = 10 (left) and r = 20 (right) and a — 1, 
the thick line represents the minimal value of e satisfying ( [Tsl and ( 14 1 in terms 
of a. The thin line in each plot represents the function (2/5)crr. 



4.2 Column incoherence 



We now highlight another property of the forward map P that allows for ro- 
bust bar code reconstruction. The left-integer and right-integer codes for the 
UPC bar code, as enumerated in Table Q, are well-separated by design: the 
€i -distance between any two distinct codes is greater than or equal to 2. Conse- 
quently, if Dk are the columns of the bar code dictionary V, then miufej^fc^ jjUfej^ — 
£>fc2l|i — 2. This implies for the forward map V — aQ{ij)T) that when there is 
no blur, i.e. (7 = 0, 



mm 



l'k2 



mm 



J.J) 
Pki 



JJ) 

Pfe2 



2ar, (15) 



where r is the over-sampling ratio. As the blur increases from zero, the column 
separation factor /i = ^{a^a,r) decreases smoothly. In Figure [5] we plot /i ver- 
sus fj for different oversampling ratios, as obtained from numerical simulation. 
Simulations such as these suggest that ^ closely follows the curve ji k, 2are^'^, 
at least in the range cr < 1. 



5 A simple decoding procedure for UPC bar 
codes 



We know from the bar code determination problem ( 10 ) that without additive 
noise, the observed data d is the sum of 15 columns from V, one column from 
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Figure 5: For oversampling ratios r = 10 (left) and r 

minimal column separation ji = miufej^^fc^ p^j^^ ~ Pk, 

V = G{o')'D, as a function of the standard deviation a of the Gaussian kernel 
The plots suggest that fj, « 2are~°' for a < 1. 



20 (right), we plot the 
for the forward map 



each block P'-^K Based on this observation, we will employ a reconstruction 
algorithm which, once initialized, selects the column from the successive block 
to minimize the norm of the data remaining after the column is subtracted. 
This greedy algorithm is described in pseudo-code as follows. 



Algorithm 1: Recover UPC Bar Code 



initialize : 

for £= 1,62,123, xe ^ 1 
else X£ = 

for j = 2 : 7, 9 : 14 

fcmin = arg miufc S - p[^^ 

if J < 7, e^l + 10{j- 
else £^62 + 10{j ~ 9) 

X£ 1 

r ^ S - pX^.^ 

end 



6 Analysis of the algorithm 

Algorithm 1 recovers the bar code one digit at a time by iteratively scanning 
through the observed data. The runtime complexity of the method is dominated 
by the 12 calculations of k^in performed by the algorithm's single loop over 
the course of its execution. Each one of these calculations of consists of 
10 computations of the £i-norm of a vector of length m. Thus, the runtime 
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complexity of the algorithm is 0(m), and can be executed in less than a second 
for standard UPC bar code proportions]^ 

6.1 Recovery of the unknown bar code 

Recall that the 12 unknown digits in the unknown bar code c are represented 
by the sparse vector x in c = Vx. We already know that xi = xg2 = 2:123 = 1 as 
these elements corresponds to the mandatory start, middle, and end sequences. 
Assuming for the moment that the forward map V is known, i.e., that both 
a and a are known, we now prove that the greedy algorithm will reconstruct 
the correct bar code from noisy data d — Vx + ft, as long as V is sufficiently 
block-diagonal and if its columns are sufficiently incoherent. In the next section 
we will extend the analysis to the case where a and a are unknown. 



Theorem 1. Suppose Ii, . . . , /15 C [m] and e € M satisfy the conditions (13)- 
(14). Then, Algorithm 1 will correctly recover a bar code signal x from noisy 
data d = Vx + h provided that 



J3) 
Pk2 



>2{\\h\j^\\^ + 2e) 



for all j G [15] and fci, A;2 G [10] with ki ^ k2. 
Proof: 



Suppose that 



15 



3 = 1 



(16) 



Furthermore, denoting kj = fcmin in the for-loop in Algorithm 1, suppose that 



,kji-i have already been correctly recovered. Then the residual data, 5, 



at this stage of the algorithm will be 



where 5^/ is defined to be 



15 



0) 



3=j' + i 

We will now show that the j'th execution of the for-loop will correctly recover 

(?') 

p]:,, , thereby establishing the desired result by induction. 

Suppose that the j'th execution of the for-loop incorrectly recovers fccn- 7^ kji . 
This happens if 

< 



^-Pk'l 



•^In practice, when cr is not too large, a 'windowed' vector of length less than m can be 
II (7)11 

used to approximate <5 — Pj. for each k,j. This can reduce the constant of proportionality 
associated with the runtime complexity. 



SYMBOL-BASED BAR CODE DECODING 



13 



In other words, we have that 



> 



> 



„(/) 



1 11"., 



I/":, 



^1 



1 


5, 


1 


- h 




3 


1 


pH 


+ 

1 


5.y 


3 3 



3e 



from conditions (|13|) and ( 14 ) . To finish, we simply simultaneously add and 



subtract ||(5j'|7^., + ft.]/^., ||i from the last expression to arrive at a contradiction 
to the supposition that /corr ^ kjr. 



> 





- p'p 


If 


- 2 

1 






<^ 


- pP 


If 


- 2 

1 







\5r+h\\. 



> 



(17) 

□ 



Remark 6.1. Equation (|13| implies that 

> min 



mm 

J,fel//C2 



P^} 



Pk. 



j,ki^k2 



Pk\ -Pi 



2e = n-2e 



Thus, the recovery condition ( 16 1 in Theorem 1 will hold whenever 

M-2e>2(||/i|/J^ + 2£). 

Using the empirical relationships e — {2/5)ara and /i = 2are^" , we obtain the 
following upper bound on the level of sufficient noise for successful recovery: 



max ||/i|/-|L < ar(e " — (6/5)cr). 

iG[12] " ^"^ 



(18) 



In practice the Gaussian blur width 2^2 ln(2)(7 does not exceed the minimum 
width of the bar code, which we have normalized to be 1. This translates to a 
maximal standard deviation of cr w .425, and a noise ceiling in ( 18 ) of 



max /ilf. L < .144ar. 
je[i2] II "^lli - 



(19) 



This should be compared to the ^i-norm of the bar code signal over a block; the 
average £i norm between the left-integer and right-integer codes is 3.5q;. 



'See equation ( |15| l for the definition of ^. 
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Remark 6.2. In practice it may be beneficial to apply Algorithm 1 several 
times, each time changing the order in which the digits are decoded. For exam- 
ple, if the distribution of the noise is known in advance, it would be beneficial 
to to initialize the algorithm in regions of the bar code with less noise. 



6.2 Stability of the greedy algorithm with respect to pa- 
rameter estimation 

Insensitivity to unknown a 

In the previous section we assumed a known Gaussian convolution matrix aG{a). 
In fact, this is generally not the case. In practice both a and a must be estimated 
since these parameters depend on the distance from the scanner to the bar code, 
the refiectivity of the scanned surface, the ambient light, etc. This means that in 
practice. Algorithm 1 will be decoding bar codes using only an approximation to 
aQ{a). Suppose that the true standard deviation generating a sampled sequence 
d is cr, but that Algorithm 1 uses a different value a for reconstruction. We can 
regard the error incurred by a as additional additive noise in our sensitivity 
analysis, setting h' = h + a{Gi<^) — Gi^))T^x and rewriting the inverse problem 
as 

d ~ aQ{(j)'Dx + h 

= ag{d)Vx+(h + a{g{a)-g{d))Vx^ 

= ag{d)Vx + h'. (20) 

We now describe a procedure for estimating a. Note that the middle portion 
of the observed data of length 5r, d^i^ — d\j^^ represents a blurry image of 
the known middle pattern M = [01010]. Let V = Qia)!) be the forward map 
generated by the estimate a when a = 1, and consider the sub-matrix 



Pmid 



p(8) 



which is also a vector of length 5r. If cr = or w crj^ we expect a good 
estimate for a to be the least squares solution 

a = argmin \\ap,nid - d^idh = Pmidd-niid/W PmidWl- (21) 



Dividing both sides of the equation (20) by S, the inverse problem becomes 



^ = ^gi^)Vx+ih'. (22) 
a a a 

Suppose that 1 — 7< a/ a < I-I-7 for some < 7 < 1. Then fixing the 
data to be d = d/a and fixing forward map to be = Qia)!), the recovery 



conditions (131, (14), and (16) become respectively 



^Here we have assumed that the noise level is low. In noisier settings it should be possible 
to develop more effective methods for estimating a by incorporating the characteristics of the 
scanning noise. 



SYMBOL-BASED BAR CODE DECODING 



15 



2. 



3. 



< 



m]\Ij 



1+7 



for all j e [15] and fee [10]. 



< for all j e [14] and valid kj> e [10] 



Pki 



Pk. 



2e 

1-7 



Consequently, li a ~ a and 1 ^ a w a, the conditions for correct bar code 
reconstruction do not change much. 



Insensitivity to unknown a 

We have seen that one way to estimate the scaling a is to guess a value for a 
and perform a least-squares fit of the observed data. In doing so, we found that 
the sensitivity of the recovery process with respect to cr is proportional to the 
quantity 



{g{a)-g{d))vx\. 



(23) 



in the third condition immediately above. Note that all the entries of the matrix 
g{a) — g{(f) will be small whenever a ^ a. Thus, Algorithm 1 should be able 
to tolerate small parameter estimation errors as long as the "almost" block 
diagonal matrix formed using a exhibits a sizable difference between any two 
of its digit columns which might (approximately) appear in any position of a 
given UPC bar code. 



To get a sense of the size of the term (23), let us further investigate the 



expressions involved. Recall that using the dictionary matrix I?, a bar code 
sequence of O's and Ts is given by c = Vx. When put together with the bar 
code function representation ([3]), we see that 



where 



Therefore, we have 



2tt(t 



" rj 

[g{<7)Vxl^ = II / g^iU - t)dt. 

.1=1 -^^-1 



(24) 



Now, using the definition for the cumulative distribution function for normal 
distributions 

V27r 
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we see that 



3 /t - i + 1 

g„{U-t)dt = ^' * ■' 



a 



and we can now rewrite ( 24 1 as 



- $ 



u - j 



u - j 



We now isolate the term we wish to analyze: 



U - J 



U - J 



We are interested in the error 
\[{g{a)-gia))Vx]^\ 



- j + 1 



- j + 1 



- j + 1 



+ 



U - i 



Suppose that ^ = (^fe) is the vector of values \ti — j| for fixed i, running j, 
sorted in order of increasing magnitude. Note that and ^2 are less than or 
equal to 1, and ^3 < + 1; ^4 ^ ^2 + 1, and so on. We can center the previous 
bound around ^1 and ^2, giving 



ma)-g{d))Vx\\<Y, 

J=0 



- $ 



6 + j 



6+i 



- $ 



6 



Next we simply majorize the expression 



To do so, we take the derivative and find the critical points, which turn out to 
be 



log (7 — log a 
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Therefore, each term in the summand ( 25 1 can be bounded by 

log a — log a 



< 



= Ai(a,CT). 



- $ V2cr 



log a — log CT 



On the other hand, the terms in the sum decrease exponentially as j in- 
creases. To see this, recall the simple bound 



1 - $(a;) 



1 



' Stt Jx 



1 



'In 



Writing amax = max{(tT, ct)}, and noting that $(a;) is a positive, increasing 
function, we have for f S [0, 1) 



- $ 



< 1 - $ 



^max 

e 



< 



< 



(C + J)V 




^■max 




(C + j)v 




^■max 




(e+j)v 




^7nax { 

^ e 











(27) 



Combining the bounds (26) and (27), 



< min 



in (^(Ai((t,ct), A2(cr„ax, j))- 



Suppose that ji is the smallest integer in absolute value such that /^2Wmaxi ji) l£ 
Ai((T, a). Then from this term on, the summands in (25) can be bounded by a 
geometric series: 



E 

i>ii 



- $ 



< 



< 



2a„ 



JiV2^ 



JiV2^ 

We then arrive at the bound 

\[{g{a)-g{a))VxU < 2.jiAi(f7,a) + 
= : Bia,d). 



(28) 
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The term ( 23 1 can then be bounded according to 



< 



Ij\B{a,a) < 7rB{a,a), 



(29) 



where r = m/n is the over-samphng rate. 

Recall that in practice the width 2-^/2 ln(2)cr of the Gaussian kernel is on 
the order of the minimum bar width in the bar code, which we normalized to 1. 
When the blur exactly equals the minimum bar width, we arrive at cr w .425. 
Below, we compute the error bound B{a,a) for a = .425 and several values of 
a. 



a 


0.2 


0.4 


.5 


0.6 


.8 


B(.425,ct) 


.3453 


.0651 


.1608 


.3071 


.589 



While the bound ( 29 1 is very rough, note that the tabulated error bounds 
incurred by inaccurate a are at least roughly the same order of magnitude as the 
empirical noise level tolerance for the greedy algorithm, as discussed in Remark 
6.1. 



7 Numerical Evaluation 

In this section we illustrate with numerical examples the robustness of the greedy 
algorithm to signal noise and imprecision in the a and a parameter estimates. 
We assume that neither a nor a is known a priori, but that we have an estimate 
a for a . We then compute an estimate a from g by solving the least-squares 



problem (21 ) 



The phase diagrams in Figure [6] demonstrate the insensitivity of the greedy 
algorithm to relatively large amounts of noise. These diagrams were con- 
structed by executing Algorithm 1 on many trial input signals of the form 
d — aQ{a)'Dx + h, where h is mean zero Gaussian noise. More specifically, 
each trial signal, d, was formed as follows: a 12 digit number was first gener- 
ated uniformly at random, and its associated blurred bar code, aQ{a)'Dx^ was 
formed using the oversampling ratio r — m/n — 10. Second, a noise vector n 
with independent and identically distributed entries rij ^ A/'(0, 1) was generated 
and then rescaled to form the additive noise vector h — i'\\aG{a)'Dx\\2 (n/||n||2). 
Hence, the parameter v = ||Qg(^|pj.||^ represents the noise-to-signal ratio of each 
trial input signal d. 

We note that in laser-based scanners, there are two major sources of noise: 
electronic noise [Hj, which is often modeled as 1// noise [S], and speckle noise 
[TU] , caused by the roughness of the paper. However, the Gaussian noise used 
in our numerical experiments is sufficient for the purpose of this work. 

To create the phase diagrams in Figure [6j the greedy recovery algorithm was 
run on 100 independently generated trial input signals for each of at least 100 
equally spaced (ct, ly) grid points (a 10 x 10 mesh was used for Figure l6[a), and 
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(a) True parameter values: a = .45, o = 1. (b) True parameter values; cr = .75, a = 1 



Figure 6: Recovery Probabilities when a — I for two true a settings. The 
shade in each phase diagram corresponds to the probability that the greedy 
algorithm will correctly recover a randomly selected bar code, as a function 
of the relative noise-to-signal level, v = ^^^Wlm-, and the a estimate, a. 
Black represents correct bar code recovery with probability 1, while pure white 
represents recovery with probability 0. Each data point's shade (i.e., probability 
estimate) is based on 100 random trials. 

a 20 X 20 mesh for Figure [6](b)). The number of times the greedy algorithm 
successfully recovered the original UPC bar code determined the color of each 
region in the ((?, i^)-plane. The black regions in the phase diagrams indicate 
regions of parameter values where all 100 of the 100 randomly generated bar 
codes were correctly reconstructed. The pure white parameter regions indicate 
where the greedy recovery algorithm failed to correctly reconstruct any of the 
100 randomly generated bar codes. 

Looking at Figure |6] we can see that the greedy algorithm appears to be 
highly robust to additive noise. For example, when the a estimate is accurate 
(i.e., when ct « cr) we can see that the algorithm can tolerate additive noise 
with Euclidean norm as high as 0.25||Q;C/(cr)I?a::|j2. Furthermore, as a becomes 
less accurate the greedy algorithm's accuracy appears to degrade smoothly. 

The phase diagrams in Figures [7] and [8] more clearly illustrate how the recon- 
struction capabilities of the greedy algorithm depend on cr, a, the estimate of cr, 
and on the noise level. We again consider Gaussian additive noise on the signal, 
i.e. we consider the inverse problem d — aQ{a)'Dx + h, with independent and 
identically distributed hj ^ A/'(0,^^), for several noise standard deviation levels 
^ G [0, .63]. Note that E( \\h\i^ ||i ) = 7r^\/277i'0 Thus, the numerical results 
are consistent with the bounds in Remark [6. 1| Each phase diagram corresponds 
to different underlying parameter values (cr, o;), but in all diagrams we fix the 
oversampling ratio at r = m/n = 10. As before, the black regions in the phase 
diagrams indicate parameter values {a, ^) for which 100 out of 100 randomly 
generated bar codes were reconstructed, and white regions indicate parameter 
values for which out of 100 randomly generated bar codes were reconstructed. 

Comparing Figures [7]^a) and |8ja) with Figures [7](b) and [sjjb) , respectively, 

''This follows from the fact that the first raw absolute moment of each hj, E(|/ij|), is 
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(a) True parameter values: a = .45, q = 1. (b) True parameter values; cr = .75, a = 1 



Figtire 7: Recovery probabilities when a = 1 for two true a settings. The shade 
in each phase diagram corresponds to the probabihty that the greedy algorithm 
correctly recovers a randomly selected bar code, as a function of the additive 
noise standard deviation, ^, and the a estimate, a. Black represents correct 
bar code recovery with probability 1, while pure white represents recovery with 
probability 0. Each data point's shade (i.e., probability estimate) is based on 
100 random trials. 



we can see that the greedy algorithm's performance appears to degrade with 
increasing a. Note that this is consistent with our analysis of the algorithm in 
Section[6j Increasing a makes the forward map P = aQ{a)T> less block diagonal. 



thereby increasing the effective value of e in conditions (13) and (14). Hence, 
condition ([Is]) will be less likely satisfied as a increases. 

Comparing Figures [7] and [8] reveals the effect of a on the likelihood that the 
greedy algorithm correctly decodes a bar code. As a decreases from 1 to .25 
we see a corresponding deterioration of the greedy algorithm's ability to handle 
additive noise of a given fixed standard deviation. This is entirely expected since 
a controls the magnitude of the blurred signal aQ{a)'Dx. Hence, decreasing a 
effectively decreases the signal-to-noise ratio of the measured input data d. 

Finally, all four of the phase diagrams in Figures [7] and [8] demonstrate how 
the greedy algorithm's probability of successfully recovering a randomly selected 
bar code varies as a function of the noise standard deviation, ^, and a estimation 
error, |ct — (t|. As both the noise level and a estimation error increase, the 
performance of the greedy algorithm smoothly degrades. Most importantly, we 
can see that the greedy algorithm is relatively robust to inaccurate cr estimates 
at low noise levels. When ^ ~ the greedy algorithm appears to suffer only a 
moderate decline in reconstruction rate even when — crj cr. 

Figure [9] gives examples of two bar codes which the greedy algorithm cor- 
rectly recovers when a — 1, one for each value of cr presented in Figure [7] In 
each of these examples the noise standard deviation, ^, and estimated cr value, 
(T, were chosen so that they correspond to dark regions of the example's asso- 
ciated phase diagram in Figure [7] Hence, these two examples represent noisy 
recovery problems for which the greedy algorithm correctly decodes the under- 
lying UPC bar code with relatively high probability^ Similarly, Figure [lO| gives 

^The ^ and values were chosen to correspond to dark regions in a Figure[7]phase diagram, 
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(a) True parameter values: a = .45, a = .25. (b) True parameter values: a = .75, a = .25 

Figure 8: Recovery Probabilities when a = .25 for two true a settings. The 
shade in each phase diagram corresponds to the probabihty that the greedy 
algorithm will correctly recover a randomly selected bar code, as a function of 
the additive noise standard deviation, ^, and the a estimate, a. Black repre- 
sents correct bar code recovery with probability 1, while pure white represents 
recovery with probability 0. Each data point's shade (i.e., probabihty estimate) 
is based on 100 random trials. 

two examples of two bar codes which the greedy algorithm correctly recovered 
when a — 0.25. Each of these examples has parameters that correspond to a 
dark region in one of the Figure [8] phase diagrams. 

8 Discussion 

In this work, we present a greedy algorithm for the recovery of bar codes from 
signals measured with a laser-based scanner. So far we have shown that the 
method is robust to both additive Gaussian noise and parameter estimation 
errors. There are several issues that we have not addressed that deserve further 
investigation. 

First, we assumed that the start of the signal is well determined. By the start 
of the signal, we mean the time on the recorded signal that corresponds to when 
the laser first strikes a black bar. This assumption may be overly optimistic if 
there is a lot of noise in the signal. Preliminary numerical experiments suggest 
that the algorithm is not overly sensitive to uncertainties in the start time, and 
we are currently working on the development of a fast preprocessing algorithm 
for locating the start position from the samples. 

Second, while our investigation shows that the algorithm is not sensitive to 
the parameter cr in the model, we did not address the best means for obtaining 
reasonable approximations to a. In applications where the scanner distance 
from the bar code may vary (e.g., with handheld scanners) other techniques 
for determining a will be required. Given the robustness of the algorithm to 
parameter estimation errors it may be sufficient to simply fix a to be the ex- 
pected optimal (T parameter value in such situations. In situations where more 
accuracy is required, the hardware might be called on to provide an estimate 



not necessarily to purely black regions. 
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Measured Data 
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Correctly Reconstructed Bar Code 





(a) True parameter values: a = .45, a = 1. E stimated a = .3 and Noise Standard Deviation 
^ = ,3. Solving the least-squares problem | |21[ | yields an a estimate of a = .9445 from ct. The 
relative noise-to-signal level, v = i, ^^j^!^^ n , is 0.4817. 



Measured Data 
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Correctly Reconstructed Bar Code 



Hill 



I 



30 40 50 60 70 80 



(b) True parameter values; cr = .75, o = 1. Estimated a = 1 and Noise Standard Deviation 
^ = .2. Solving the least-squares problem (|21[| yields an a estimate of S = 1.1409 from a. 



The relative noise-to-signal level, u ■■ 



\\ag{a)-Dx\\2 



, is 0.3362. 



Figure 9: Two example recovery problems corresponding to dark regions in 
each of the phase diagrams of Figure [7] These recovery problems are examples 
of problems with a = 1 for which the greedy algorithm correctly decodes a 
randomly selected UPC bar code approximately 80% of the time. 
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Measured Data 
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Correctly Reconstructed Bar Code 



ffiMiiiiiiiiiiiff 

10 20 30 40 SO 60 70 SO 90 100 

(a) True parameter values: a = .45, a = .25. Estimated a = .5 and Noise Standard Deviation 
^ = .1. Solving the least-squares problem ( |21| yields an a estimate of o = 0.2050 from a. 
The relative noise-to-signal level, i/ = -n — ^/^fe n -, is 0.7001. 



Measured Data 




Correctly Reconstructed Bar Code 

lllllllll llllllllllllllllllllll 

10 20 30 40 50 60 70 80 90 100 

(b) True parameter values: a = .75, a = .25. Estimated a = .8 and Noise Standard Deviation 
^ = .06. Solving the least-squares problem | |21| l yields an a estimate of S = 0.3057 from a. 
The relative noise-to-signal level, i/ = j, — ^/^i^^ n -, is 0.4316. 

Figure 10: Two example recovery problems corresponding to dark regions in 
each of the phase diagrams of Figure |8] These recovery problems are examples 
of problems with a = .25 for which the greedy algorithm correctly decodes a 
randomly-selected UPC bar code approximately 60% of the time. 
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of the scanner distance from the bar code it is scanning, which could then be 
used to help produce a reasonable a value. In any case, we leave more careful 
consideration of methods for estimating a to future work. 

The final assumption wc made was that the intensity distribution is well 
modeled by a Gaussian. This may not be sufficiently accurate for some distances 
between the scanner and the bar code. Since intensity profile as a function of 
distance can be measured, one can conceivably refine the Gaussian model to 
capture the true behavior of the intensities. 
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